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ABSTRACT 



A cache memory system comprises a cache 4, a prefetch 
store 5, and a memory controller 3. The controller 3 receives 
requests from a processor 1 for access to lines of data stored 
in a memory 2 and maintains priority data indicative of the 
relative priority of lines of data stored in the cache 4. The 
controller 3 responds to receipt of a processor request for 
access to data in a line N such that: for a cache hit, the 
controller supplies the data from the cache 4 to the processor 
1; for a cache miss when line N is not stored in the prefetch 
store 5, the controller 3 retrieves line N from the memory, 
and controls storage of the line in the cache 4 and supply of 
the data to the processor 1, the priority data for line N being 
set to a high relative priority; for a cache miss when line N 
is stored in the prefetch store 5, the controller 3 transfers line 
N from the prefetch store 5 to the cache 4 and supplies the 
data to the processor 1, the priority data for line N being set 
to a low relative priority; and for both a cache hit and a cache 
miss, the controller 3 prefetches the sequentially next line 
N+l from the memory 2 to the prefetch store 5. Prefetching 
is preferably only performed for a defined subset of the lines 
in the memory. 

33 Claims, 2 Drawing Sheets 
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LOADING ACCESSED DATA FROM A ventional cache. Execution of instructions in these environ- 

PREFETCH BUFFER TO A LEAST ments is driven by external events, such as a new host I/O 

RECENTLY USED POSITION IN A CACHE arriving or a disk I/O completing, and these events tend to 

occur in a random order. Each event calls for a particular 

FIELD OF THE INVENTION 5 sequence of instructions to be executed a single time. These 

Ine present invention relates generally to cache memory ^mictions are not then executed again until the next time 
systems and provides apparatus and methods for facilitating ^ cvcnl u oca L rs - ^ lat * o£ re P eUUon mcans ma ! ■ cache 

access by a processor to data stored in a memory. ^\ aot be effective uoless an event repeats within the 

hie time of lmes stored in the cache. If the set of all events 

BACKGROUND OF THE INVENTION 10 is large, and the code that is executed for each event is 

mostly unique, then the full set of code will exceed the 

In processing systems such as computers, the data to be available cache memory space. Thus, an instruction will 

utilized by a processor is stored in a main memory and on ly be executed once before it is evicted from the cache and 

control logic manages the transfer of data between the replaccd ^ anothcf instruction for evcnL ms 

memory and the processor in response to requests issued by 15 mcans mat the cache is nQt effective ^ ^ ■ the 

the processor. The data stored in the main memory generally instruction throughput of the processor, 

includes both instructions to be executed by the processor ~ . . a- . r , 

AAt . , . , , c • t •* One way to improve efficiency of a cache memory system 

and data to be operated on by the processor. For simplicity, ■ . / • 

, . , j . j . c j , ii i is to attempt to anticipate processor requests and retrieve 

both instructions and true data are referred to collectively r r j / • a t-. 4 , . 

, « , . „ , . . . . ™ J lines or data from the memory in advance. This technique is 

herein as data unless the context otherwise requires. The on , c . .. TT 0 J n t XT c e ,, \ 

. , . . ■ i *• i i * 20 known as prefetching. U.S. Pat. No. 5,566.324 discloses 

time taken by a main memory access is relatively lone m « r . . c 

1t - . ' 4 . a c j t such a memory system in which, in the event of a main cache 

relation to operating speeds of modem processors. To . 7 /, . . A . ' - , , 

j j ... , . iL , ^ j . miss, a current line is retneved from memory and the 

address this, a cache memory with a shorter access time is .. n , v . d . , , , , . 7 _ , 

u . ' 11, , iL sequentially next line is retrieved and stored in a prefetch 

generally interposed between the mam memory and the \ lf f . , , . , , _ . t . 

to jfu .L. rj. cache. If the prefetched line is requested next by the 

processor, and the control logic manages the storage of data ... \ . . A , . t \, . , J 

. ■ j i , u • • *u u j *L i 25 processor, this line is then loaded to the mam cache and 

retneved from the mam memory in the cache and the supply v a I *u ( , t 

t a * c *u u * 4 u r™ / supplied to the processor, so that a mam memory access is 

of data from the cache to the processor. The cache is „ r " A A TT c n \ XT . ' 0 „ . . 7 _ . 

.... u . , u1 . „ *\ . j . t avoided. U.S. Pat. No. 4,980,823 discloses another prefetch- 
organized into multiple "lmes , each line providing storage . . . \, 7 . , ~\ , 
e ui 1 r c j * .l ■ u -\ mg system which, rather than usmg a separate prefetch store 
for a block, or line, of data from the mam memory which ■ H o d * kt c r * u v j- .1 • ! 

, i , . i •. asm U.S. Pat. No. 5,566,324. prefetches hnes directly into 

may be many bytes in length. When the processor issues a ™ TDTM # . * £ u \r r.u* . 

t e A* v kt *u i • j * 30 an LRU location of the cache. Known prefetching systems 

request for data in a lme N, the control logic determines . c 4 : * • i- . 

whether that line is stored in the cache. If so, ie. if there is ^ 'T™ P«fi™»™» to some extent m apphcations 

a cache hit, the data is retrieved from the cache. If not, ie. if Z • "™ ^ . set J" enUaUy b 7 1116 P racessor - 

there is a cache miss, the data must be retrieved from the H ° W6Ver ' m m ? y W^ b0 ™ ^ Processing requirements 

main memory and the processor is stalled while this opera- 35 m mo ' e comp ex ™* *f effec ^ encss °f curr "t P^fetch- 

tion takes place. Since a cache access is much faster than a 35 m f , SyS,e , mS ,S , lln,ltcd - ^ e «PP hcatI0 P s desenbed above in 

- « i j * t^t iL relation to a storage controller provide an example. While 

mam memory access, it is clearly desirable to manage the + , ri 6 f A ^ 4 au vAom F ^. 

, , . . 1-1 . • r , . 4 , the use ot known prefetching systems in this environment 

system so as to achieve a high ratio of cache hits to cache .„ F * 11 r • 

T . . , -m - . , will save some processor stall tune for mam memory 
misses. Increasmg the size of the cache makes this task . ? I 4 . « j ^ 
. t . • . * a j.t_ accesses, ie. tor the sequentially called lines within a par- 
easier, but cache memory is expensive in comparison to the ^ .. , ' . f A *u- a * • J-T ' . F 1 

slower, main memory. It is therefore importani to use cache 40 ^ lar sc ?°° ° f ^V,^ d ° CS n ° l ^P 10 ^ 

memory space as efficiently as possible. ^ 0VCral1 of thc memor y s y stem - 

In conventional cache memory systems, a line of data DISCLOSURE OF THE INVENTION 

retrieved from the memory following a cache miss is stored According to one aspect of the present invention there is 

in the cache, overwriting a previously stored line which is 4 5 provided a cache memory system for facilitating access by 

selected for eviction by the control logic in accordance with a processor to lines of data stored in a memory, the system 

a priority system. The priority system indicates the relative comprising: a cache for storing hnes of data for access by the 

priority of lines of data stored in the cache, with low priority processor; a prefetch store for storing lines of data to be 

lines being selected for eviction before higher priority lines. transferred to the cache; and a memory controller for receiv- 

The control logic implements the priority system by main- 50 ing processor requests for access to lines of data and 

taining priority data indicative of the current priorities of the retrieving lines of data from the memory, the memory 

stored lines various priority systems are known, though the controller maintaining priority data indicative of the relative 

generally favoured technique is a Least Recently Used priority of lines of data stored in the cache. The memory 

system where the control logic maintains data indicating controller is configured to respond to receipt of a processor 

relatively how recently lines stored in the cache have been 55 request for access to data in a line N such that: in the case 

accessed by the processor. The least recendy used (LRU) of a cache hit, the memory controller controls supply of that 

line is selected for eviction first when space is required for data from the cache to the processor; in the case of a cache 

a new line, and this line then becomes the most recently used m is S when line N is not stored in the prefetch store, the 

(MRU) line when it is read out to the processor. Whatever memory controller retrieves line N from the memory, and 

the priority system employed, it is desirable to utilize the 60 controls storage of the line in the cache and supply of the 

cache memory space so as to reduce processor stall time due data to the processor, the priority data for line N being set to 

to main memory accesses as far as possible. a high relative priority; in the case of a cache miss when line 

In practice, the effectiveness of current cache memory N is stored in the prefetch store, the memory controller 

systems is dependent on the nature of the processing appli- transfers line N from the prefetch store to the cache and 

cation. For example, real time multithreaded applications, 65 controls supply of the data to the processor, the priority data 

such as in a storage controller environment, have an execu- for line N being set to a low relative priority, and for both 

tion profile which is unfriendly to the operation of a con- a cache hit and a cache miss, the memory controller 
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prefetches the sequentially next line N+l from the memory above or below a defined threshold address, and the memory 
to the prefetch store. controller simply checks the line address against the thresh- 
Thus, in embodiments of the present invention, prefetch- old address to determine if a prefetch should be performed, 
ing is performed on both a cache hit and a cache miss, and To re(hlce pre f e tch traffic on the system bus, the memory 
prefetched data lines which are then requested and supplied 5 controller preferably performs a prefetch after confirming 
0 the processor are stored in the cache with a lower priority ^ the ^ tQ be fetched is not ^ stored m the 
than lines retrieved direcUy from the main memory. As a cache ^ m ^ M ^ ^ pretciching is 

result, lines which generally cannot be prefetched success- anfl „„„, »u ft „ l 1 „ *■ c *u * u Z 

c . ■ . .1 \ . r r * unnecessary though operation of the system would other- 

fully are retained in the cache in favour of lines which can . ff . TV 5 c * u . c j 

. * , iLnm. t_* L ^ j wise De unaffected if the prefetch step were performed, 

be prefetched successfully. Those lines which are called out 10 

of sequence, eg. the branch targets for the working set of a ^ memorv controller preferably maintains the priority 

piece of code, will therefore tend to be retained in the cache. data m accordance with a Least Recently Used system. In 

Because prefetching is performed for cache hits as well as particular, it is preferred that lines supplied from the prefetch 

cache misses, a line which can be prefetched successfully is buffer t0 ^ P rocessor assume LRU status m the cache, and 

generally always retrieved by prefetching. Thus, in opera- 15 Unes loaded directl > r from memory to the processor 

tion of the system, processor stall time for main memory MRU status m the cachc - 

accesses is substantially reduced. Another aspect of the present invention provides process- 

A highly efficient cache memory system is therefore in S apparatus comprising a processor, a memory for storing 

achieved, reducing processor stall time for main memory unes of data t0 be accessed by the processor, and a cache 

accesses and allowing a smaller cache to be significantly 20 memor y system according to the first aspect of the invention 

more effective than previously. For example, considering the coupled between the processor and the memory. The cache 

system of U.S. Pat. No. 5,566,324 mentioned above, that memory system in this apparatus may include one or more 

system performs prefetching only on main cache misses, and of ^ preferred features mentioned herein, 

prefetched lines which are then requested and supplied to the A further aspect of the present invention provides a 

processor are stored in the main cache conventionally, taking 25 method for facilitating access by a processor of a data 

a high priority (MRU) status. Successfully prefetched lines processing system to lines of data stored in a memory of the 

therefore compete for cache space with the more valuable system, wherein the system includes a cache for storing lines 

Unes which had to be retrieved directly from main memory. of data for access by the processor and a prefetch store for 

Further, lines which could have been prefetched successfully storing lines of data to be transferred to the cache, and 

may have to be retrieved directly from memory if the 30 wherein the processor generates requests for data to which 

preceding line resulted in a cache hit. Thus, stall time for access is required, the method comprising maintaining in the 

main memory accesses will be high unless a large cache is system priority data indicative of the relative priority of lines 

employed. Similarly, while U.S. Pat. No. 4,980,823 loads a of data stored in the cache, and responding to a processor 

prefetched line directly to a low priority (LRU) cache request for access to data in a line N by: determining 

location, if that line is then requested by the processor it will 35 whether line N corresponds to a cache hit or a cache miss; 

then be accorded a high priority (MRU). Again, therefore, in the case of a cache hit, supplying the requested data to the 

successfully prefetched lines compete for cache space with processor; in the case of a cache miss, determining whether 

the more valuable lines which had to be accessed directly line N is stored in the prefetch store; when line N is not 

from main memory, and cache efficiency is limited in stored in the prefetch store on a cache miss, retrieving line 

comparison to embodiments of the present invention. 40 N from the memory, and storing the line in the cache and 

In preferred embodiments of the invention the memory supplying the requested data to the processor, the priority 

controller prefetches line N+l only for lines in a defined data for line N being set to a high relative priority; when fine 

subset of the lines stored in the memory. For example, before N is stored in the prefetch store on a cache miss, transferring 

implementing a prefetch, the memory controller may per- line N from the prefetch store to the cache and supplying the 

form a test to see if line N+l (or line N depending on 45 requested data to the processor, the priority data for line N 

implementation) is in this subset and omit the prefetch for a being set to a low relative priority; and for both a cache hit 

negative result. The subset of lines for which prefetching is and a cache miss, prefetching the sequentially next line N+l 

performed can be defined in dependence on the nature of the fr° m memory to the prefetch store, 

data lines, in that lines which are deemed suitable for In general, it is to be understood that, where features are 

prefetching are included in the subset and lines deemed 50 described herein with reference to an apparatus embodying 

unsuitable for prefetching are not. The types of lines which the invention, corresponding features may be provided in a 

are suitable for prefetching will be apparent to those skilled method embodying the invention, and vice versa, 
in the art, but as an example, a series of lines which is long, 

infrequently accessed and in which the lines will be called BRIEF DESCRIPTION OF THE DRAWINGS 

sequentially is particularly suitable for prefetching. On the 55 n c , ... - . 

other hand, for example, a series of lines which is short, J Pre { er f d embodunents of the invention will now be 

accessed frequently or contains looped instructions is gen- described > b V ^ °* example with reference to the accom- 

eraUyunsuitableforprefetching.Thesubsetofprefetchable * which ^IG. 1 is a schematic block 

lines could be defined in the system in various ways, for diagram of processing apparatus embodying the invention, 

example by a dedicated flag in the processor request which 60 and FI h G ' \ £ _ a J w chart dlustratin g operation of the 

is detected by the memory controller. In some applications, a PP aratus ot WG. 1. 

the subset may be defined dynamically, changing during DESCRIPTION OF THE PREFERRED 

operation of the system. Preferably, the subset of lines EMBODIMENTS 
corresponds to a particular region of the memory, the set (or 

sets) of addresses in this region being defined in the memory 65 The processing apparatus of FIG. 1 comprises a processor 

controller. In a particularly simple implementation, the sub- 1 and a main memory 2, implemented in DRAM, in which 

set corresponds to those lines in memory addresses either fines of data to be used by the processor are stored. A 
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memory controller 3 is connected between the processor 1 counts are maintained by the control unit 6 in accordance 

and memory 2. A cache 4, implemented in SRAM, is with a least recently used system. In a conventional cache 

connected to the processor 1 and memory controller 3, and implementing an LRU system, the counts registered by the 

a prefetch store in the form of a prefetch buffer 5 is counters indicate relatively how recently the associated 

connected to the memory controller 3. The memory con- 5 cache lines were accessed by the processor, the lower the 

troller 3 comprises logic for controlling the transfer of lines count the more recent the access. Each time a line is read out 

of data from the memory 2 to the processor 1 and the storage to the processor, the counter associated with that cache line 

of lines of data in the cache 4 and prefetch buffer 5. The logic is reset to zero, and all counters (or those counters in the 

elements comprise a cache & prefetch control unit 6, a same set for a set associative cache) which register a lower 

memory access unit 7, and a priority data indicator, in the 1Q count than that previously registered by the reset counter are 

form of a set of priority counters 8, in which priority data incremented by one. Thus, the counter registering a zero 

indicating the relative priority of lines of data stored in the count corresponds to the MRU line, and the counter regis- 

cache is maintained by the control unit 6. The logic elements tering the highest count corresponds to the LRU line. 

6 to 8 of the memory controller 3 are connected as shown in Generally the LRU line will be overwritten when space is 

the figure, with the cache 4 being connected to the control J5 needed for a storage of a new line, this line then becoming 

unit 6 and the prefetch buffer 5 connected to the memory the MRU line, and so on. In the present apparatus, the cache 

access unit 7. While the logic elements 6 to 8 are shown as access controller 6 controls the counters 8 in accordance 

separate units in the figure, it will be understood that these with this known system, but with one important difference 

elements can be implemented in hardware or software and in the case of lines loaded to the cache from the prefetch 

may be integrated together or with other functional ele- ^ buffer 5 as will be explained below, 

ments. Detailed logic for implementing the various elements Tht prefetch buffer 5 in this embodiment is a simple 

will be apparent to those skilled in the art from the following store-and-forward buffer which is one cache-line wide. In 

description. 0 t Der embodiments the prefetch store may be a cache or 

In the present embodiment, the processor 1 is integrated other multiple line storage device, but this is not necessary 

with cache 4, memory controller 3 and prefetch buffer 5 in ^ for successful operation of the system. Further embodiments 

a microprocessor indicated generally at 9. Here, therefore, may integrate the prefetch store with the cache, for example 

cache 4 constitutes a first level cache for the processor 1, using a dedicated cache location, which is not used for other 

though in other embodiments it may be a lower level cache. data, as the prefetch store. However, the present embodi- 

Also, in other embodiments the memory 2 may be a memory ment provides a particularly simple implementation in 

other than the main memory, such as a second level cache. 30 which the cache is not polluted with prefetched instructions 

Modern microprocessors often have separate instruction which are not then used by the processor, 

and data cache systems. The present embodiment wiU be The memory access unit 7 operates to access the main 

described in the context of such a system with the cache 4 memory 2 to retrieve instruction lines when required. When 

serving as the instruction cache of the processor. The inven- a main memory access is required the control unit 6 supplies 

tion can be applied to particular advantage in such a system 35 the address of the required line to the memory access unit 7. 

but it will be apparent that the invention can also be applied The memory access unit is capable of accessing the memory 

in systems where the cache 4 is not integrated with the 2 in two modes which will be referred to herein as a single 

processor and/or where the cache is a data cache or is used line mode and a dual fine mode. In the single line mode, the 

for both data and instructions. unit 7 accesses the memory 2 in the usual way to retrieve a 

In this embodiment, the code to be executed by the 40 single line at the address supplied by the control unit 6. In 
processor 1 is stored in the memory 2 as a series of blocks the dual line mode, the unit 7 accesses the memory to 
or lines of instructions at sequential address locations. The retrieve two lines, namely the line at the address supplied by 
processor 1 generates requests for instructions as they are the control unit 6 and also the fine at the sequentially next 
required for execution. A request indicates the address of the address in the memory 2. (As will be apparent to those 
line containing the required instruction and is supplied to the 45 skilled in the art the dual line mode can be implemented by 
control unit 6. The control unit 6 then determines if the a DRAM burst mode access in known manner). The appro- 
requested line is stored in the cache 4. While the cache 4 priate access mode is indicated to the memory access unit by 
could be a direct mapped cache or a fully associative cache, a flag which is set to one of two states by the control unit 6 
in this embodiment the preferred choice is a set associative and is supplied to the memory access unit 7 together with the 
cache which is often used in the embedded environment and 50 line address. 

the control logic for which is relatively simple. In particular, In addition to controlling operation of the cache 4 and 

the cache 4 in this embodiment is a 4-way set associative memory access unit 7, the control unit 6 controls storage and 

cache which therefore has four cache locations in each of retrieval of lines in the prefetch buffer 5. The prefetch buffer 

multiple sets of cache lines. Lines stored in the cache 4 are 5 is used for storing lines which have been retrieved from 

accessed in known manner by the control unit 6 using a tag 55 memory 2 in advance of a processor request for access to 

directory stored in the unit 6. The line address indicates a that line. In particular, in response to a processor request for 

particular set of the cache, and the tag directory indicates the instructions in a given fine, the sequentially next line will be 

real address for each fine stored in the corresponding set in prefetched if certain conditions are satisfied. In this 

the cache. In the event of a cache hit, the control unit 6 embodiment, prefetching is only performed for lines in an 

implements read out of the line from the cache to the eo "active region" of the memory 2. The active region in this 

processor in known manner. example is the address region of the memory 2 correspond- 

The control unit 6 also controls the priority counters so ing to addresses up to a threshold address which is defined 

that the counts reflect the appropriate relative priority of in the control unit 6. The threshold address in this embodi- 

lines stored in the cache at any time. The priority counters ment represents the sequentially next address after the last 

could be implemented in various ways as will be apparent to 65 address in the active region, so that the state of a single bit 

those skilled in the art. In this preferred embodiment one in line addresses indicates whether the line is in the active or 

priority counter 8 is provided for each cache line and the inactive region. 
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In accordance with this embodiment, instructions which available cache location. Otherwise, the LRU line in the set 

are determined at the design stage to be suitable for prefetch- will be overwritten by line N. Line N is then read out to the 

ing are stored in the active region of the memory. Instruc- processor 1. Normally, when a line is read out to the 

tions which are identified as unsuitable for prefetching are processor, the priority counters will be updated as described 

stored in addresses outside this region. More particularly, 5 above to reflect the newly accessed line as the MRU line. In 

during the design stage the system designer can assess the this case, however, since line N was supplied from the 

characteristics of separate sections of code (usually a whole prefetch buffer 5, line N is to be assigned LRU status. The 

function at a time) and classify them according to run length, priority counters are therefore maintained in their previous 

frequency of invocation, and sequential vs.- looping execu- states, with the Line N counter being set to indicate line N 

tion. Further, the designer is capable of locating code 1Q as the LRU line, so the control unit 6 does not need to 

segments with a fine degree of control. Certain segments of actively reset the counters in this case. In this way, a line 

code, such as those which are long, which are largely supplied to the processor from the prefetch buffer is stored 

sequential, which take few long branches forwards, and in the cache with a low priority. From step 18, the operation 

which are infrequently accessed, are placed in the active proceeds to step 13 in which the conditions for prefetching 

region of the memory. Other segments of code, such as those 15 line N+l are checked and prefetching is performed or not as 

which are short, looping, or accessed frequently are placed already described. 

in the inactive region. This division of lines between the Returning to step 17, if there is a prefetch buffer miss here 

active and inactive regions of the memory enables the then the currently requested line N is not available from 

effectiveness of the memory system to be greatly magnified. either the cache or the prefetch buffer, and a main memory 

This is even more so when, as is preferred, the design stage ^ access is required. Operation then proceeds to step 19 in 

includes deliberately laying out certain code segments for which the control unit 6 determines whether line N+l is in 

sequential execution, and placing such segments in specific the active region of the memory in the same way as step 13. 

code regions. Using the placement and inlining/ If N+l is not in the active region, prefetching is not required 

straightlining capabilities of modern compiler and linker and operation proceeds to step 20. In step 20, the control unit 

toolsets, the code can be optimized and suitable code placed ^ 6 supplies the address for line N to the memory access unit 

in the active region of the memory. This is extremely 7 setting the mode flag to indicate a single line access. The 

effective in enabling high performance code execution with memory access unit 7 retrieves line N from the memory and 

a small cache. forwards the line to the control unit 6. The control unit 6 then 

Operation of the apparatus will now be described in detail loads line N to the cache, evicting the LRU line from the 

with reference to the flowchart of FIG. 2. The process starts 30 appropriate set if all set locations are occupied. Line N is 

at step 10 when the processor 1 issues a request for instruc- then read out to the processor 1, the priority counters 8 are 

tions in a line, say line N. The request is supplied to the updated in the usual way to reflect line N as the MRU line, 

control unit 6 which determines, in step U, whether there is and the operation is complete. 

a cache hit. If so, the process proceeds to step 12 in which However, if at step 19 it is determined that line N+l is in 

the control unit 6 accesses the cache so that line N is read out 35 the active region, then the process proceeds to step 21 in 

to the processor 1. The control unit then updates the priority which, like step 14, the control unit 6 checks whether N+l 

counters 8 in the usual way. That is, the counter associated is already stored in the cache or the prefetch buffer 5. If so, 

with line N is reset to zero, indicating MRU status, and those prefetching is not required and the operation reverts to step 

counters in the same set of the cache which register a lower 20 and continues as previously described. If there is a 

count than the value previously held by the reset counter are 40 negative result at step 21, then both retrieval of line N and 

incremented. Operation then proceeds to step 13 in which prefetching of line N+l are required, and operation proceeds 

the control unit 6 determines whether the sequentially next to step 22. Here, the control unit 6 supplies the address of 

line N+l lies in the active region of the memory. This is done line N to the memory access unit 7, but this time sets the 

by incrementing the requested address for line N to obtain mode flag to indicate a dual line access. The unit 7 then 

the line N+l address, and then checking the state of the 4S accesses the memory 2, retrieves line N and forwards this 

address bit which corresponds to the threshold of the active line to the control unit 6. In the same access process the 

region. If the threshold bit is "1", then N+l is outside the memory access unit then retrieves line N+l and forwards 

active region and the process is complete. If the threshold bit this line to the control unit 6. On receipt of line N from the 

is "0" then line N is in the active region and operation memory access unit 7, the control unit loads the line to the 

proceeds to step 14. 50 cache, overwriting the LRU line in the appropriate set as 

In step 14, the control unit 6 checks whether line N+l is required, and line N is then read out to the processor 1. The 

already stored in the cache or the prefetch buffer 5. If so, no control unit then updates the priority counters in the usual 

further action is required. If not, then the process proceeds way to reflect line N as the MRU line. On receipt of line N+l 

to step 15 in which prefetching of line N+l is performed. from the memory access unit 7 the control unit stores line 

Specifically, the control unit 6 supplies the address for line 55 N+l in the prefetch buffer 5, and the operation is complete. 

N+l to the memory access unit 7, setting the mode flag to As described above, the apparatus operates to assign lines 

indicate a single line access mode. The memory access unit which are supplied from the prefetch buffer to the processor 

then retrieves line N+l from the memory 2 in known manner a low priority, here LRU status, in the cache. Lines which are 

and forwards the line to the control unit 6. The control unit supplied to the processor following a main memory access 

6 loads line N+l to the prefetch buffer and the operation is eo for a current request are assigned a high priority, here MRU 

complete. status, in the cache. Thus lines which have not been suc- 

Returning to step U, if a cache miss is obtained in this cessfully prefetched are given precedence in the cache over 

step, operation proceeds to step 17 in which the control unit lines which have been successfully prefetched. This means 

6 checks whether line N is currently stored in the prefetch that the more valuable lines, for which a main memory 

buffer 5. If so, ie. for a prefetch buffer hit, the control unit 65 access was required while the processor was waiting, are 

6 loads line N to the cache. If the appropriate cache set for retained in the cache in favour of lines which were accessed 

storing line N is not full, the line will be stored in the next much more quickly from the prefetch buffer 5. Further, the 
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prefetching of instructions in the active region is performed 
even for a cache hit on the current instruction. Thus, a line 
which can be prefetched successfully is generally always 
retrieved by prefetching, even if the immediately preceding 
line was already available in the memory controller 3. If 5 
prefetching of a line N+l were only performed on a line N 
miss, then if N is an instruction which was successfully 
prefetched previously but had survived in the cache, a 
request for line N+l may result in a cache miss, and hence 
a main memory access, even though N+l could have been 
prefetched successfully. Consideration of the above shows 
that the operation tecids to leave the cache containing lines 
which were not prefetched, these being the lines which had 
the highest cost in terms of processor stall time. These lines 
were not called sequentially, and may have been loaded as 
a result of a branch instruction or an interrupt handler for 15 
example. The processor thus perceives less time stalled for 
main memory accesses and so achieves a higher rate of 
instructions executed. If the cache 4 is large enough to store 
all the branch targets for the working set of a piece of code, 
then the system allows the processor to suffer no stall time 
in executing instructions, and this with significantly less 
cache memory space than is required to hold the entire 
working set of the code. In general, a small cache can be 
made as effective as a cache several times its size. This is of 
course beneficial in many applications, and can be of par- 
ticular assistance in embedded applications where the 
designer has the opportunity to define the first level memory 
system but can only afford a small cache in total. Further 
benefits arise, particularly in the embedded environment, 
from having just a single, integrated cache as in the appa- 
ratus of FIG. 1. This represents a simple design, and saves 
development time as well as silicon area, which reduces 
product cost. Thus memory systems embodying the inven- 
tion can be very effective in enabling high performance ^ 
processor operation, with a small amount of cache and cheap 
external memory. 

While embodiments of the invention have been described 
in detail above, it will be apparent to those skilled in the art 
that many variations and modifications can be made to the ^ 
embodiments described without departing from the scope of 
the invention. For example, while a least recently used 
priority system is adopted in the above apparatus, the same 
principles can be applied where other priority systems are 
used. Further, in response to a cache miss for a line requested 45 
by the processor in the above apparatus, that line is loaded 
to a cache location and then read out to the processor. In 
other embodiments, loading of the cache line and supply of 
the line to the processor may be performed in parallel, 
though this would require modification of the usual cache 5Q 
line loader. 

What is claimed is: 

1. A cache memory system for facilitating access by a 
processor to lines of data stored in a memory, the system 
comprising: S5 

a cache for storing lines of data for access by the proces- 
sor; 

a prefetch store for storing lines of data to be transferred 
to the cache; and 

a memory controller for receiving processor requests for eo 
access to lines of data and retrieving lines of data from 
the memory, the memory controller maintaining prior- 
ity data indicative of the relative priority of lines of data 
stored in the cache; 

wherein the memory controller is configured to respond to 65 
receipt of a processor request for access to data in a line 
N such that: 



in the case of a cache hit, the memory controller controls 
supply of that data from the cache to the processor; 

in the case of a cache miss when line N is not stored in the 
prefetch store, the memory controller retrieves line N 
from the memory, and controls storage of the line in the 
cache and supply of the data to the processor, the 
priority data for line N being set to a high relative 
priority; 

in the case of a cache miss when line N is stored in the 
prefetch store, the memory controller transfers line N 
from the prefetch store to the cache and controls supply 
of the data to the processor, the priority data for line N 
being set to a low relative priority, and 

for both a cache hit and a cache miss, the memory 
controller prefetches the sequentially next line N+l 
from the memory to the prefetch store. 

2. A system according to claim 1 wherein the memory 
controller is configured to prefetch line N+l only for lines in 
a defined subset of the lines stored in the memory. 

3. Asystem according to claim 2 wherein said subset is the 
set of lines stored in a defined region of the memory. 

4. A system according to claim 2 wherein the memory 
controller is configured to prefetch line N+l when line N+l 
is a line in said subset. 

5. A system according to claim 1 wherein the memory 
controller is configured to prefetch fine N+l only if line N+l 
is not stored in the cache. 

6. A system according to claim 1 wherein the memory 
controller is configured to prefetch line N+l only if line N+l 
is not stored in the prefetch store. 

7. A system according to claim 1 wherein the memory 
controller maintains said priority data in accordance with a 
Least Recently Used system whereby the priority data 
generally indicates relatively how recently lines stored in the 
cache have been accessed by the processor. 

8. A system according to claim 7 wherein said high 
relative priority indicates a most recently used line and said 
low relative priority indicates a least recently used line. 

9. A system according to claim 1 wherein the cache 
comprises a set associative cache. 

10. Processing apparatus comprising a processor, a 
memory for storing lines of data to be accessed by the 
processor, and a cache memory system connected between 
the processor and the memory, the cache memory system 
comprising: 

a cache for storing lines of data for access by the proces- 
sor; 

a prefetch store for storing lines of data to be transferred 
to the cache; and 

a memory controller for receiving processor requests for 
access to lines of data and retrieving lines of data from 
the memory, the memory controller maintaining prior- 
ity data indicative of the relative priority of lines of data 
stored in the cache; 

wherein the memory controller is configured to respond to 
receipt of a processor request for access to data in a line 
N such that: 

in the case of a cache hit, the memory controller controls 
supply of that data from the cache to the processor; 

in the case of a cache miss when line N is not stored in the 
prefetch store, the memory controller retrieves line N 
from the memory, and controls storage of the line in the 
cache and supply of the data to the processor, the 
priority data for line N being set to a high relative 
priority; 

in the case of a cache miss when line N is stored in the 
prefetch store, the memory controller transfers line N 
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from the prefetch store to the cache and controls supply in the case of a cache hit, supplying the requested data to 

of the data to the processor, the priority data for line N the processor; 

being set to a low relative priority, and in the case of a cache miss, determining whether line N is 

for both a cache hit and a cache miss, the memory stored in the prefetch store; 

controller prefetches the sequentially next line N+l 5 when line N is not stored in the prefetch store on a cache 

from the memory to the prefetch store. miss, retrieving line N from the memory, and storing 

11. Apparatus according to claim 10 wherein said cache is the line in the cache and supplying the requested data 
a first level cache of the processor. to the processor, the priority data for line N being set to 

12. Apparatus according to claim 10 wherein said cache a high relative priority; 

is an instruction cache of the processor. 10 when line N is stored in the prefetch store on a cache miss, 

13. Apparatus according to claim 10 wherein said cache transferring line N from the prefetch store to the cache 
comprises a set associative cache. m $ supplying the requested data to the processor, the 

14. Apparatus according to claim 10 wherein said priority data for line N being set to a low relative 
memory is a main memory of the processor. priority; and 

15. Apparatus according to claim 10 wherein the memory 15 for both a cache hit ^ a cachc mij ^ prefetching ^ 
controller is configured to prefetch line N+l only for lines in sequentially next line N+l from the memory to the 
a defined subset of the lines stored in the memory. prefetch store. 

16. Apparatus according to claim 15 wherein said subset 24. A method according to claim 23 wherein said 
is the set of lines stored in a defined region of the memory. prefetching of line N+l is only performed for lines in a 

17. Apparatus according to claim 16 wherein lines of data 20 defined subsct of ^ 1{oes stQrcd in me memory 
deemed suitable for prefetching are stored in said defined 25. A method according to claim 24 wherein said subset 
region of the memory, and fines of data deemed unsuitable ^ me ^ of stored ^ a defined region of me memory> 
for prefetching are stored in another region of the memory. 2 6. A method according to claim 25 including the step of 

18. Apparatus according to claim 15 wherein the memory storing ^ of data which arc Stable for prefetching in 
controller is configured to prefetch line N+l when fine N+l 25 said rcgion of the mcmory and storing lines of data whicQ 
is a line in said subset. are unsuitable for prefetching outside said region. 

19. Apparatus according to claim 10 wherein the memory 27. A method according to claim 24 wherein prefetching 
controller is configured to prefetch line N+l only if line N+l 0 f line N+l is performed if line N+l is a line in said subset, 
is not stored in the cache. 28. A method according to claim 23 wherein prefetching 

20. Apparatus according to claim 10 wherein the memory 30 of ^ N+1 ^ only performed ^ ^ N+1 is not stored m the 
controller is configured to prefetch line N+l only if line N+l cache. 

is not stored in the prefetch store. 2 9. A method according to claim 23 wherein prefetching 

21. Apparatus according to claim 10 wherein the memory Q f ii nc N+l is only performed if line N+l is not stored in the 
controller maintains said priority data in accordance with a prefetch store. 

Least Recently Used system whereby the priority data 35 30. A method according to claim 23 including maintaining 

generally indicates relatively how recently lines stored in the said pr i or ity data in accordance with a Least Recently Used 

cache have been accessed by the processor. algorithm whereby the priority data generally indicates 

22. Apparatus according to claim 21 wherein said high relatively how recently lines stored in the cache have been 
relative priority indicates a most recently used line and said accessed by the processor. 

low relative priority indicates a least recently used line. 40 31. A method according to claim 30 wherein said high 

23. A method for facilitating access by a processor of a relative priority indicates a most recently used line and said 
data processing system to lines of data stored in a memory i ow relative priority indicates a least recently used line. 

of the system, wherein the system includes a cache for 32. A method according to claim 23 wherein said lines of 

storing lines of data for access by the processor and a data comprise fines of instructions to be implemented by the 

prefetch store for storing lines of data to be transferred to the 45 processor. 

cache, and wherein the processor generates requests for data 33, A method according to claim 32 wherein said 

to which access is required, the method comprising the steps prefetching of line N+l is only performed for lines stored in 

of maintaining in the system priority data indicative of the a defined region of the memory, and wherein the method 

relative priority of lines of data stored in the cache, and includes the step of configuring said lines of instructions for 

responding to a processor request for access to data in a line 50 sequerjtial execu tion and storing said lines in said region of 

N t> v: the memory, 
determining whether line N corresponds to a cache hit or 

a cache miss; ***** 
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